What are the baselines for protein fold recognition?

نویسندگان

  • Liam J. McGuffin
  • Kevin Bryson
  • David T. Jones
چکیده

MOTIVATION What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure. RESULTS Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system. CONCLUSIONS Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Face Recognition using Eigenfaces , PCA and Supprot Vector Machines

This paper is based on a combination of the principal component analysis (PCA), eigenface and support vector machines. Using N-fold method and with respect to the value of N, any person’s face images are divided into two sections. As a result, vectors of training features and test features are obtain ed. Classification precision and accuracy was examined with three different types of kernel and...

متن کامل

Expression of a Chimeric Protein Containing the Catalytic Domain of Shiga-Like Toxin and Human Granulocyte Macrophage Colony-Stimulating Factor (hGM-CSF) in Escherichia coli and Its Recognition by Reciprocal Antibodies

Fusion of two genes at DNA level produces a single protein, known as a chimeric protein. Immunotoxins are chimeric proteins composed of specific cell targeting and cell killing moieties. Bacterial or plant toxins are commonly used as the killing moieties of the chimeric immunotoxins. In this investigation, the catalytic domain of Shiga-like toxin (A1) was fused to human granulocyte macrophage ...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Utility of P19 Gene-Silencing Suppressor for High Level Expression of Recombinant Human Therapeutic Proteins in Plant Cells

Background: The potential of plants, as a safe and eukaryotic system, is considered in the production of recombinant therapeutic human protein today; but the expression level of heterologous proteins is limited by the post-transcriptional gene silencing (PTGS) response in this new technology. The use of viral suppressors of gene silencing can prevent PTGS and improve transient expression level ...

متن کامل

Protein structure comparison: implications for the nature of 'fold space', and structure and function prediction.

The identification of geometric relationships between protein structures offers a powerful approach to predicting the structure and function of proteins. Methods to detect such relationships range from human pattern recognition to a variety of mathematical algorithms. A number of schemes for the classification of protein structure have found widespread use and these implicitly assume the organi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 17 1  شماره 

صفحات  -

تاریخ انتشار 2001